2 research outputs found

    Omen: discovering sequential patterns with reliable prediction delays

    Get PDF
    Suppose we are given a discrete-valued time series XX X of observed events and an equally long binary sequence YY Y that indicates whether something of interest happened at that particular point in time. We consider the problem of mining serial episodes, sequential patterns allowing for gaps, from XX X that reliably predict those interesting events. With reliable we mean patterns that not only predict that an interesting event is likely to follow, but in particular that we can also accurately tell how how long until that event will happen. In other words, we are specifically interested in patterns with a highly skewed distribution of delays between pattern occurrences and predicted events. As it is unlikely that a single pattern can explain a complex real-world progress, we are after the smallest, least redundant set of such patterns that together explain the interesting events well. We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best patterns as those that describe the occurrences of interesting events YY Y most succinctly given the data over XX X . As neither discovering the optimal explanation of YY Y given a set of patterns, nor the discovery of optimal pattern set are problems that allow for straightforward optimization, we break the problem in two and propose effective heuristics for both. Through extensive empirical evaluation, we show that both our main method, Omen , and its fast approximation fOmen , work well in practice and both quantitatively and qualitatively beat the state of the art
    corecore